10x单细胞转录组项目结题报告
- 项目信息
- 项目编号
- HT2021-12392
- 客户姓名
- 汪洋
- 实验物种
- 人
- 样本数目
- 8
- 执行编号
- OE0855
1.项目概况
1.1.项目摘要
本次分析共完成 8 个样本单细胞转录组测序,各样本 Cell Ranger 定量质控的高质量细胞数分布在 7856~12256 个,经剔除双细胞、多细胞和凋亡细胞等质控后,最终获得的细胞数目分布在 6062~10210 个,每个细胞中的平均 UMI 数分布在 3613~4993,每个细胞中的平均基因数分布在 1236~1497,每个细胞中平均线粒体基因比例分布在 0.0631~0.0869。降维聚类后共分为 24 群细胞,供参考的细胞类型有 Neutrophils, Epithelial_cells, Endothelial_cells, B_cell, Monocyte, NK_cell, Macrophage, T_cells, CMP, Fibroblasts 。共设有 1 个差异基因分组,其检测到的差异基因数量为: 38。
1.2.项目快捷链接
2.实验技术流程
10x Genomics 平台利用微流控技术,将带有 Cell Barcode 的 bead 和细胞包裹在液滴中,收集包有细胞的液滴,然后在液滴中将细胞裂解,使得细胞中的 mRNA 与 bead 上面 Cell Barcode 结合,形成 Single Cell GEMs,在液滴中进行逆转录反应,构建 cDNA 文库,通过文库序列上的 sample index 区分目标序列的样品来源。
实验建库测序流程图如下:
3.生信分析流程
单细胞转录组生信标准分析流程概述:
1)原始数据质量评估。
2)Cell Ranger基因定量质控。
3)定量后质控:过滤低质量细胞、过滤低丰度基因(视项目具体情况而定)等。
4)基因表达标准化处理。
5)细胞异质性分析:降维聚类、Marker 基因鉴定、细胞类型鉴定、细胞亚群等其他下游个性化分析。
6)基因表达分析:差异基因分析、差异基因富集分析等其他下游个性化分析。
生物信息分析流程图如下:
4.项目分析结果
4.1.基因定量质控
使用 10x genomics 官方软件 Cell Ranger[1] 对样本进行质控,其内部整合了 STAR[2] 软件,将 reads 比对到参考基因组后,获得原始数据中高质量细胞数、基因数及基因组比对率等质控结果,从而对每个样品的质量进行评估。
| 评估参数 | 术语 | 术语说明 |
|---|---|---|
| 代表性指标 | Estimated Number of Cells | 高质量细胞数,小于所有细胞的UMI总数的99%分位数的10%定义为背景噪音细胞
|
| Mean reads per Cell | 每个细胞中的平均序列数 | |
| Median genes per Cell | 每个细胞中检测到的基因数中位值,UMI数大于0被定义为检测到的基因
| |
| 测序质量 | Number of reads | 原始下机数据的序列数 |
| Valid Barcodes | 有效Cell Barcode序列数百分比 | |
| Sequencing Saturation | 测序饱和度 | |
| Q30 Bases in Barcode | 测得的Cell Barcode序列质量大于Q30的序列数百分比
| |
| Q30 Bases in RNA Read | 测得的R2 reads中质量大于Q30的序列百分比 | |
| Q30 Bases in UMI | 测得的UMI序列中序列质量大于Q30的序列百分比 | |
| 比对质量 | Reads Mapped to Genome | 比对到基因组上的序列百分比 |
| Reads Mapped Confidently to Genome
| 高置信度比对到的基因组上的序列百分比 | |
| Reads Mapped Confidently to Intergenic Regions
| 比对到参考基因组的基因间隔区域的序列数百分比 | |
| Reads Mapped Confidently to Intronic Regions
| 比对到参考基因组的内含子区域的序列数百分比 | |
| Reads Mapped Confidently to exonic Regions
| 比对到数据库的基因组外显子区域的序列数百分比 | |
| Reads Mapped Confidently to Transcriptome
| 比对到参考物种的转录组序列上的序列百分比 | |
| Reads Mapped Antisense to Gene
| 比对到基因的负链的序列百分比 | |
| 细胞质量 | Estimated Number of Cells | 估计检测到的高质量细胞数 |
| Fraction Reads in Cells | 在高质量细胞的序列数百分比 | |
| Mean Reads per Cell | 每个高质量细胞的平均序列数 | |
| Median Genes per Cell | 每个高质量细胞的基因数中位数 | |
| Total Genes Detected | 所有细胞检测到的基因总数 | |
| Median UMI Counts per Cell | 每个高质量细胞的UMI中位数 |
4.2.定量后质控
单细胞转录组测序采用测得的转录本序列结合 UMI 和 Cell Barcode ,以获知单细胞内每一个转录本分子的绝对数量。
4.2.1.过滤低质量细胞
在 Cell Ranger 初步质控的基础上进一步对实验数据进行质控,剔除多细胞、双细胞或者未结合上细胞的数据。理论上大部分细胞中表达的基因数量、UMI 数量和线粒体基因的表达量会集中分布在某一区域内,根据它们的分布特征可以拟合分布模型,使用该模型找到其中的离域值,剔除异常数据。
本项目中的质控标准为:保留细胞基因数和UMI数在平均值 ± 2 倍标准差范围内、且线粒体基因比例低于 30 %的细胞作为高质量细胞,进行下游分析。
线性模型拟合曲线及质控前后每个细胞的基因数量(nGene)、UMI数量(nUMI)和线粒体基因所占比例(percent_mito)的小提琴图展示如下:
Q:为什么在 Cell Ranger 质控之后还要再做质控?
由于凝胶珠中的细胞数目并不是理想的只有一个,可能会出现没有细胞或2个甚至多个细胞的情况,另外,当细胞死亡或破碎时,细胞中的线粒体基因比例也会上升。因此在下游分析前必须要将这些干扰去除。去除方法一般可以根据经验分别为这些参数人为设置一个标准,使用该标准可以直接用于过滤表达矩阵。也可以根据参数的分布,自动判断离群值并删除。
图 4.2.1.2 质控前后每个细胞中线粒体基因比例的小提琴分布图
图片说明:纵轴表示线粒体基因占单个细胞所有基因的比例,图中每个点代表一个油包水微滴中细胞的线粒体基因比例,每个小提琴图反映对应样本中所有细胞的线粒体基因在细胞所有基因中所占的比例,一般要求大部分细胞的线粒体基因比例越低越好(特殊样本除外)。左图为质控前,右图为质控后。
图 4.2.1.3 质控前后每个细胞中基因表达数目的小提琴分布图
图片说明:纵轴表示细胞中有表达的基因数目,图中每个点代表一个油包水微滴中细胞的基因数目。该图反映样本中的每一个细胞表达基因的数目,基因数目异常过多的点很可能是由于对应的油包水微滴中包含多个细胞,需要通过设置合理的阈值将其过滤。左图为质控前,右图为质控后。
图 4.2.1.4 质控前后每个细胞中 UMI 数目的小提琴分布图
图片说明:纵轴表示 UMI 数,图中的每个点代表一个油包水微滴中细胞的 UMI 数目,即转录本的数目,该图反映样本中每一个细胞的转录本数目,转录本数目异常过多的点很可能是由于对应的油包水微滴中包含多个细胞,需要通过设置合理的阈值将其过滤。左图为质控前,右图为质控后。
定量质控前后的细胞数统计情况如下表所示:
| sampleid | Mean_nUMI_beforeQC | Mean_nGene_beforeQC | Mean_mito.percent_beforeQC | Total_cells_beforeQC | Mean_nUMI_afterQC | Mean_nGene_afterQC | Mean_mito.percent_afterQC | Total_cells_afterQC |
|---|---|---|---|---|---|---|---|---|
| N1 | 5531.96454857704 | 1549.93351324828 | 0.101585850611978 | 8152 | 3652.99357011635 | 1276.51025719535 | 0.0805236811914321 | 6532 |
| N2 | 7416.62232688391 | 1831.65287678208 | 0.125951936784972 | 7856 | 4008.42873639063 | 1368.81524249423 | 0.0799513516937971 | 6062 |
| N3 | 7037.53456316717 | 1674.09907762462 | 0.108668187324253 | 9649 | 4079.72259628333 | 1307.36506867762 | 0.0637067370244822 | 7426 |
| N4 | 5238.20737712534 | 1541.8086035492 | 0.084169647235691 | 10763 | 3695.01982126489 | 1272.36801099908 | 0.0630570467880452 | 8728 |
| T1 | 6147.06859404199 | 1623.69517423326 | 0.144306427364796 | 9097 | 3974.28170660432 | 1352.06502045587 | 0.0868867637472502 | 6844 |
| T2 | 6024.05815831987 | 1567.63558735287 | 0.143751615314578 | 8666 | 3612.79213483146 | 1236.26029962547 | 0.0814187773519177 | 6408 |
| T3 | 9189.54171315235 | 1806.69988843436 | 0.0917421288863667 | 8067 | 4993.12339693463 | 1321.73522051924 | 0.0650416549705286 | 6394 |
| T4 | 6691.80221932115 | 1834.42158942559 | 0.08291347464677 | 12256 | 4636.46434867777 | 1496.88011753183 | 0.0700312794136979 | 10210 |
4.3.降维与聚类分析
单细胞转录组的定量矩阵是一个 M * N 维矩阵,一般矩阵中的每一行表示基因,每一列表示细胞。通常一个单细胞转录组测序样本的定量矩阵可以达到上万行*上万列这样超高维度,在这样一个超高维度下进行聚类分析,不仅运算量极大,而且难以获得较好的聚类结果。因此,在对单细胞转录组的定量结果聚类之前,一般需要先降维。所谓降维,也就是从上万的基因中提取新的维度,使用新维度表示的数据,一方面既能最大限度地保留样本中数据的信息,另一方面能够有效减少数据中的冗余,从而提高后续聚类运算效率。
一般而言,相似的细胞具有相似的基因表达谱,因此可以根据每个细胞的基因表达结果,将表达谱相似的细胞聚类到一起,形成一个细胞群。
4.3.1.降维聚类结果
本项目中采用的降维算法为 MNN(mutual nearest neighbors, 互享最近邻) 和 t-SNE(t-distributed Stochastic Neighbor Embedding, t-分布邻域嵌入)算法。基于 MNN(mutual nearest neighbors, 互享最近邻) 的降维结果通过 t-SNE 对单细胞群聚类进行可视化,聚类算法采用SNN,最终获得最优细胞分群。
降维聚类二维坐标表格见:
降维聚类 3D 展示图见:
4.3.2.样本间降维聚类分组展示
多样本间降维聚类分组展示图如下:
每个细胞群中样本占比的柱状统计图如下:
多个样本的降维聚类分面展示图如下:
每个样本中细胞群占比的柱状统计图如下:
各样本在不同细胞群中的细胞数目统计表如下:
| sampleid | clusters | cell_number | freq |
|---|---|---|---|
| N1 | 1 | 1668 | 25.535823637477 |
| N1 | 2 | 1076 | 16.4727495407226 |
| N1 | 3 | 440 | 6.7360685854256 |
| N1 | 4 | 12 | 0.183710961420698 |
| N1 | 5 | 405 | 6.20024494794856 |
| N1 | 6 | 1141 | 17.4678505817514 |
| N1 | 7 | 504 | 7.71586037966932 |
| N1 | 8 | 36 | 0.551132884262094 |
| N1 | 9 | 103 | 1.57685241886099 |
| N1 | 10 | 33 | 0.50520514390692 |
| N1 | 11 | 91 | 1.39314145744029 |
| N1 | 12 | 156 | 2.38824249846908 |
| N1 | 13 | 230 | 3.52112676056338 |
| N1 | 14 | 2 | 0.0306184935701164 |
| N1 | 15 | 15 | 0.229638701775873 |
| N1 | 16 | 220 | 3.3680342927128 |
| N1 | 17 | 80 | 1.22473974280465 |
| N1 | 18 | 47 | 0.719534598897734 |
| N1 | 19 | 187 | 2.86282914880588 |
| N1 | 20 | 10 | 0.153092467850582 |
| N1 | 21 | 64 | 0.979791794243723 |
| N1 | 23 | 9 | 0.137783221065524 |
| N1 | 24 | 3 | 0.0459277403551745 |
| N2 | 1 | 1161 | 19.1520950181458 |
| N2 | 2 | 1361 | 22.4513361926757 |
| N2 | 3 | 232 | 3.82711976245464 |
| N2 | 4 | 130 | 2.14450676344441 |
| N2 | 5 | 12 | 0.197954470471791 |
| N2 | 6 | 952 | 15.7043879907621 |
| N2 | 7 | 375 | 6.18607720224348 |
| N2 | 8 | 107 | 1.76509402837347 |
| N2 | 9 | 112 | 1.84757505773672 |
| N2 | 10 | 18 | 0.296931705707687 |
| N2 | 11 | 867 | 14.3022104915869 |
| N2 | 12 | 235 | 3.87660838007258 |
| N2 | 13 | 118 | 1.94655229297262 |
| N2 | 14 | 13 | 0.214450676344441 |
| N2 | 15 | 52 | 0.857802705377763 |
| N2 | 16 | 172 | 2.83734741009568 |
| N2 | 17 | 61 | 1.00626855823161 |
| N2 | 18 | 6 | 0.0989772352358957 |
| N2 | 19 | 58 | 0.956779940613659 |
| N2 | 20 | 5 | 0.0824810293632465 |
| N2 | 21 | 4 | 0.0659848234905972 |
| N2 | 22 | 2 | 0.0329924117452986 |
| N2 | 23 | 4 | 0.0659848234905972 |
| N2 | 24 | 5 | 0.0824810293632465 |
| N3 | 1 | 1133 | 15.2572044169135 |
| N3 | 2 | 493 | 6.63883652033396 |
| N3 | 3 | 1205 | 16.2267708052788 |
| N3 | 4 | 1638 | 22.0576353353084 |
| N3 | 5 | 192 | 2.58551036897388 |
| N3 | 6 | 224 | 3.01642876380285 |
| N3 | 7 | 249 | 3.35308375976299 |
| N3 | 8 | 444 | 5.97899272825209 |
| N3 | 9 | 152 | 2.04686237543765 |
| N3 | 10 | 371 | 4.99596014004848 |
| N3 | 11 | 70 | 0.942633988688392 |
| N3 | 12 | 123 | 1.65634258012389 |
| N3 | 13 | 194 | 2.61244276865069 |
| N3 | 14 | 208 | 2.80096956638836 |
| N3 | 15 | 88 | 1.18502558577969 |
| N3 | 16 | 75 | 1.00996498788042 |
| N3 | 17 | 184 | 2.47778077026663 |
| N3 | 18 | 156 | 2.10072717479127 |
| N3 | 19 | 67 | 0.902235389173175 |
| N3 | 20 | 20 | 0.269323996768112 |
| N3 | 21 | 27 | 0.363587395636951 |
| N3 | 22 | 64 | 0.861836789657959 |
| N3 | 23 | 34 | 0.457850794505791 |
| N3 | 24 | 15 | 0.201992997576084 |
| N4 | 1 | 330 | 3.78093492208983 |
| N4 | 2 | 604 | 6.92025664527956 |
| N4 | 3 | 1640 | 18.7901008249313 |
| N4 | 4 | 1380 | 15.8111824014665 |
| N4 | 5 | 1237 | 14.172777268561 |
| N4 | 6 | 60 | 0.687442713107241 |
| N4 | 7 | 417 | 4.77772685609533 |
| N4 | 8 | 328 | 3.75802016498625 |
| N4 | 9 | 196 | 2.24564619615032 |
| N4 | 10 | 802 | 9.18881759853346 |
| N4 | 11 | 81 | 0.928047662694776 |
| N4 | 12 | 76 | 0.870760769935839 |
| N4 | 13 | 32 | 0.366636113657195 |
| N4 | 14 | 506 | 5.7974335472044 |
| N4 | 15 | 269 | 3.0820348304308 |
| N4 | 16 | 42 | 0.481209899175069 |
| N4 | 17 | 117 | 1.34051329055912 |
| N4 | 18 | 359 | 4.11319890009166 |
| N4 | 19 | 55 | 0.630155820348304 |
| N4 | 20 | 67 | 0.767644362969752 |
| N4 | 21 | 49 | 0.56141154903758 |
| N4 | 22 | 20 | 0.229147571035747 |
| N4 | 23 | 31 | 0.355178735105408 |
| N4 | 24 | 30 | 0.343721356553621 |
| T1 | 1 | 588 | 8.59146697837522 |
| T1 | 2 | 1091 | 15.9409701928697 |
| T1 | 3 | 1258 | 18.3810637054354 |
| T1 | 4 | 61 | 0.891291642314436 |
| T1 | 5 | 1118 | 16.3354763296318 |
| T1 | 6 | 97 | 1.41729982466394 |
| T1 | 7 | 770 | 11.2507305669199 |
| T1 | 8 | 539 | 7.87551139684395 |
| T1 | 9 | 46 | 0.672121566335476 |
| T1 | 10 | 339 | 4.95324371712449 |
| T1 | 11 | 149 | 2.177089421391 |
| T1 | 12 | 99 | 1.44652250146113 |
| T1 | 13 | 42 | 0.613676212741087 |
| T1 | 14 | 60 | 0.876680303915839 |
| T1 | 15 | 74 | 1.0812390414962 |
| T1 | 16 | 6 | 0.0876680303915839 |
| T1 | 17 | 121 | 1.76797194623027 |
| T1 | 18 | 102 | 1.49035651665693 |
| T1 | 19 | 65 | 0.949736995908825 |
| T1 | 20 | 171 | 2.49853886616014 |
| T1 | 21 | 37 | 0.5406195207481 |
| T1 | 22 | 4 | 0.0584453535943892 |
| T1 | 23 | 5 | 0.0730566919929866 |
| T1 | 24 | 2 | 0.0292226767971946 |
| T2 | 1 | 1379 | 21.519975031211 |
| T2 | 2 | 1354 | 21.1298377028714 |
| T2 | 3 | 136 | 2.12234706616729 |
| T2 | 4 | 50 | 0.780274656679151 |
| T2 | 5 | 195 | 3.04307116104869 |
| T2 | 6 | 1558 | 24.3133583021223 |
| T2 | 7 | 254 | 3.96379525593009 |
| T2 | 8 | 32 | 0.499375780274657 |
| T2 | 9 | 473 | 7.38139825218477 |
| T2 | 10 | 13 | 0.202871410736579 |
| T2 | 11 | 158 | 2.46566791510612 |
| T2 | 12 | 64 | 0.998751560549313 |
| T2 | 13 | 84 | 1.31086142322097 |
| T2 | 14 | 14 | 0.218476903870162 |
| T2 | 15 | 67 | 1.04556803995006 |
| T2 | 16 | 268 | 4.18227215980025 |
| T2 | 17 | 84 | 1.31086142322097 |
| T2 | 18 | 65 | 1.0143570536829 |
| T2 | 19 | 132 | 2.05992509363296 |
| T2 | 20 | 2 | 0.031210986267166 |
| T2 | 21 | 18 | 0.280898876404494 |
| T2 | 22 | 2 | 0.031210986267166 |
| T2 | 23 | 2 | 0.031210986267166 |
| T2 | 24 | 4 | 0.0624219725343321 |
| T3 | 1 | 1100 | 17.2036284016265 |
| T3 | 2 | 422 | 6.59993744135127 |
| T3 | 3 | 210 | 3.28432905849234 |
| T3 | 4 | 1693 | 26.4779480763215 |
| T3 | 5 | 29 | 0.453550203315608 |
| T3 | 6 | 273 | 4.26962777604004 |
| T3 | 7 | 125 | 1.95495777291211 |
| T3 | 8 | 23 | 0.359712230215827 |
| T3 | 9 | 1142 | 17.860494213325 |
| T3 | 10 | 21 | 0.328432905849234 |
| T3 | 11 | 20 | 0.312793243665937 |
| T3 | 12 | 30 | 0.469189865498905 |
| T3 | 13 | 530 | 8.28902095714733 |
| T3 | 14 | 62 | 0.969659055364404 |
| T3 | 15 | 119 | 1.86111979981232 |
| T3 | 16 | 219 | 3.42508601814201 |
| T3 | 17 | 240 | 3.75351892399124 |
| T3 | 18 | 7 | 0.109477635283078 |
| T3 | 19 | 75 | 1.17297466374726 |
| T3 | 20 | 8 | 0.125117297466375 |
| T3 | 21 | 7 | 0.109477635283078 |
| T3 | 22 | 26 | 0.406631216765718 |
| T3 | 23 | 11 | 0.172036284016265 |
| T3 | 24 | 2 | 0.0312793243665937 |
| T4 | 1 | 348 | 3.40842311459354 |
| T4 | 2 | 1049 | 10.2742409402547 |
| T4 | 3 | 1434 | 14.0450538687561 |
| T4 | 4 | 813 | 7.96278158667973 |
| T4 | 5 | 1555 | 15.230166503428 |
| T4 | 6 | 153 | 1.49853085210578 |
| T4 | 7 | 396 | 3.87855044074437 |
| T4 | 8 | 1579 | 15.4652301665034 |
| T4 | 9 | 96 | 0.940254652301665 |
| T4 | 10 | 453 | 4.43682664054848 |
| T4 | 11 | 206 | 2.01762977473066 |
| T4 | 12 | 662 | 6.48383937316356 |
| T4 | 13 | 44 | 0.430950048971596 |
| T4 | 14 | 381 | 3.73163565132223 |
| T4 | 15 | 539 | 5.27913809990206 |
| T4 | 16 | 74 | 0.724779627815867 |
| T4 | 17 | 178 | 1.743388834476 |
| T4 | 18 | 110 | 1.07737512242899 |
| T4 | 19 | 32 | 0.313418217433888 |
| T4 | 20 | 10 | 0.0979431929480901 |
| T4 | 21 | 46 | 0.450538687561214 |
| T4 | 22 | 39 | 0.381978452497551 |
| T4 | 23 | 10 | 0.0979431929480901 |
| T4 | 24 | 3 | 0.029382957884427 |
详细结果目录见 :
4.4.细胞群间相似性分析
通过计算细胞群间基因表达平均值的 pearson 相关性系数,可以查看两个细胞群之间的相似程度。
各细胞群中的基因平均表达量表格见:
4.5.Marker基因鉴定
Marker 基因的定义是该基因在指定细胞群的绝大多数细胞中有较高的表达,而在其余细胞类群中只有少部分表达,且该基因在此细胞群相对于其他细胞群中是显著上调表达。
使用bimod检验方法对指定细胞群与其余所有细胞群进行差异检验,从而筛选得到每个细胞群的特异性 Marker 基因。
| gene | p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | cluster | gene_diff | ensemble_id | gene_type | gene_description | TFs_Family | GO_id | GO_term | pathway | pathway_description |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IGHG1 | 0 | 4.71538601365689 | 0.505 | 0.055 | 0 | 13 | 9.182 | ENSG00000211896 | IG_C_gene | immunoglobulin heavy constant gamma 1 (G1m marker) [Source:HGNC Symbol;Acc:HGNC:5525]
| -- | GO:0016020,GO:0016021,GO:0005615,GO:0005886,GO:0005515,GO:0005576,GO:0002376,GO:0002250,GO:0019814,GO:0009897,GO:0006956,GO:0006958,GO:0030449,GO:0038096,GO:0003823,GO:0034987,GO:0006910,GO:0006911,GO:0042742,GO:0045087,GO:0050853,GO:0050871,GO:0042571,GO:0070062,GO:0072562,GO:0019221
| membrane|integral component of membrane|extracellular space|plasma membrane|protein binding|extracellular region|immune system process|adaptive immune response|immunoglobulin complex|external side of plasma membrane|complement activation|complement activation, classical pathway|regulation of complement activation|Fc-gamma receptor signaling pathway involved in phagocytosis|antigen binding|immunoglobulin receptor binding|phagocytosis, recognition|phagocytosis, engulfment|defense response to bacterium|innate immune response|B cell receptor signaling pathway|positive regulation of B cell activation|immunoglobulin complex, circulating|extracellular exosome|blood microparticle|cytokine-mediated signaling pathway
| -- | -- |
| IGLC1 | 0 | 4.71299804368826 | 0.563 | 0.069 | 0 | 13 | 8.159 | ENSG00000211675 | IG_C_gene | immunoglobulin lambda constant 1 [Source:HGNC Symbol;Acc:HGNC:5855]
| -- | -- | -- | -- | -- |
| JCHAIN | 0 | 4.18698697834188 | 0.788 | 0.04 | 0 | 13 | 19.7 | ENSG00000132465 | protein_coding | joining chain of multimeric IgA and IgM [Source:HGNC Symbol;Acc:HGNC:5713]
| -- | GO:0005615,GO:0042803,GO:0005576,GO:0006955,GO:0045087,GO:0002250,GO:0006898,GO:0050900,GO:0034987,GO:0031210,GO:0070062,GO:0001895,GO:0072562,GO:0019731,GO:0003823,GO:0032461,GO:0003697,GO:0042834,GO:0019862,GO:0003094,GO:0060267,GO:0071748,GO:0071750,GO:0071751,GO:0071752,GO:0071756,GO:0030674
| extracellular space|protein homodimerization activity|extracellular region|immune response|innate immune response|adaptive immune response|receptor-mediated endocytosis|leukocyte migration|immunoglobulin receptor binding|phosphatidylcholine binding|extracellular exosome|retina homeostasis|blood microparticle|antibacterial humoral response|antigen binding|positive regulation of protein oligomerization|single-stranded DNA binding|peptidoglycan binding|IgA binding|glomerular filtration|positive regulation of respiratory burst|monomeric IgA immunoglobulin complex|dimeric IgA immunoglobulin complex|secretory IgA immunoglobulin complex|secretory dimeric IgA immunoglobulin complex|pentameric IgM immunoglobulin complex|protein binding, bridging
| -- | -- |
| IGHA2 | 0 | 3.04387778217638 | 0.352 | 0.018 | 0 | 13 | 19.556 | ENSG00000211890 | IG_C_gene | immunoglobulin heavy constant alpha 2 (A2m marker) [Source:HGNC Symbol;Acc:HGNC:5479]
| -- | GO:0016020,GO:0005615,GO:0005886,GO:0005576,GO:0006955,GO:0002376,GO:0002250,GO:0019814,GO:0009897,GO:0006898,GO:0050900,GO:0003823,GO:0034987,GO:0006910,GO:0006911,GO:0006958,GO:0042742,GO:0045087,GO:0050853,GO:0050871,GO:0042571,GO:0070062,GO:0001895,GO:0072562,GO:0019731,GO:0003094,GO:0060267,GO:0071748,GO:0071751,GO:0071752
| membrane|extracellular space|plasma membrane|extracellular region|immune response|immune system process|adaptive immune response|immunoglobulin complex|external side of plasma membrane|receptor-mediated endocytosis|leukocyte migration|antigen binding|immunoglobulin receptor binding|phagocytosis, recognition|phagocytosis, engulfment|complement activation, classical pathway|defense response to bacterium|innate immune response|B cell receptor signaling pathway|positive regulation of B cell activation|immunoglobulin complex, circulating|extracellular exosome|retina homeostasis|blood microparticle|antibacterial humoral response|glomerular filtration|positive regulation of respiratory burst|monomeric IgA immunoglobulin complex|secretory IgA immunoglobulin complex|secretory dimeric IgA immunoglobulin complex
| -- | -- |
| FCGR3B | 0 | 3.0015207278033 | 0.752 | 0.017 | 0 | 4 | 44.235 | ENSG00000162747 | protein_coding | Fc fragment of IgG receptor IIIb [Source:HGNC Symbol;Acc:HGNC:3620]
| -- | GO:0016020,GO:0016021,GO:0005886,GO:0005576,GO:0031225,GO:0070062,GO:0043312,GO:0006955,GO:0030667,GO:0019864
| membrane|integral component of membrane|plasma membrane|extracellular region|anchored component of membrane|extracellular exosome|neutrophil degranulation|immune response|secretory granule membrane|IgG binding
| path:hsa04145,path:hsa04380,path:hsa04650,path:hsa05140,path:hsa05150,path:hsa05152,path:hsa05322
| Phagosome|Osteoclast differentiation|Natural killer cell mediated cytotoxicity|Leishmaniasis|Staphylococcus aureus infection|Tuberculosis|Systemic lupus erythematosus
|
| FCN1 | 0 | 2.5559003515036 | 0.967 | 0.031 | 0 | 14 | 31.194 | ENSG00000085265 | protein_coding | ficolin 1 [Source:HGNC Symbol;Acc:HGNC:3623]
| -- | GO:0016020,GO:0046872,GO:0030246,GO:0005886,GO:0005515,GO:0005576,GO:0005581,GO:0007186,GO:0002376,GO:0045087,GO:0006956,GO:0062023,GO:0043312,GO:2000484,GO:0001867,GO:0001664,GO:0034774,GO:1904813,GO:0033691,GO:0008329,GO:0097367,GO:0002752,GO:0034394,GO:0043654,GO:0046597,GO:0031232
| membrane|metal ion binding|carbohydrate binding|plasma membrane|protein binding|extracellular region|collagen trimer|G protein-coupled receptor signaling pathway|immune system process|innate immune response|complement activation|collagen-containing extracellular matrix|neutrophil degranulation|positive regulation of interleukin-8 secretion|complement activation, lectin pathway|G protein-coupled receptor binding|secretory granule lumen|ficolin-1-rich granule lumen|sialic acid binding|signaling pattern recognition receptor activity|carbohydrate derivative binding|cell surface pattern recognition receptor signaling pathway|protein localization to cell surface|recognition of apoptotic cell|negative regulation of viral entry into host cell|extrinsic component of external side of plasma membrane
| -- | -- |
| MZB1 | 0 | 2.53601813500461 | 0.861 | 0.019 | 0 | 13 | 45.316 | ENSG00000170476 | protein_coding | marginal zone B and B1 cell specific protein [Source:HGNC Symbol;Acc:HGNC:30125]
| -- | GO:0005737,GO:0005783,GO:0005576,GO:0006915,GO:0005788,GO:0042127,GO:0033622,GO:0046626,GO:0034663,GO:0002642,GO:0030888,GO:0008284
| cytoplasm|endoplasmic reticulum|extracellular region|apoptotic process|endoplasmic reticulum lumen|regulation of cell population proliferation|integrin activation|regulation of insulin receptor signaling pathway|endoplasmic reticulum chaperone complex|positive regulation of immunoglobulin biosynthetic process|regulation of B cell proliferation|positive regulation of cell population proliferation
| -- | -- |
| MS4A2 | 0 | 2.47974755619008 | 0.933 | 0.004 | 0 | 7 | 233.25 | ENSG00000149534 | protein_coding | membrane spanning 4-domains A2 [Source:HGNC Symbol;Acc:HGNC:7316]
| -- | GO:0016020,GO:0016021,GO:0005887,GO:0006955,GO:0006954,GO:0038095,GO:0005886,GO:0032998,GO:0019863,GO:0007165,GO:0007166,GO:0009897
| membrane|integral component of membrane|integral component of plasma membrane|immune response|inflammatory response|Fc-epsilon receptor signaling pathway|plasma membrane|Fc-epsilon receptor I complex|IgE binding|signal transduction|cell surface receptor signaling pathway|external side of plasma membrane
| path:hsa04071,path:hsa04072,path:hsa04664,path:hsa05310
| Sphingolipid signaling pathway|Phospholipase D signaling pathway|Fc epsilon RI signaling pathway|Asthma
|
| GZMB | 0 | 2.42482953932267 | 0.893 | 0.046 | 0 | 3 | 19.413 | ENSG00000100453 | protein_coding | granzyme B [Source:HGNC Symbol;Acc:HGNC:4709]
| -- | GO:0016787,GO:0008233,GO:0006508,GO:0004252,GO:0008236,GO:0005739,GO:0005634,GO:0016020,GO:0005515,GO:0006915,GO:0005737,GO:0019835,GO:0005829,GO:1900740,GO:0001772,GO:0042267,GO:0008626
| hydrolase activity|peptidase activity|proteolysis|serine-type endopeptidase activity|serine-type peptidase activity|mitochondrion|nucleus|membrane|protein binding|apoptotic process|cytoplasm|cytolysis|cytosol|positive regulation of protein insertion into mitochondrial membrane involved in apoptotic signaling pathway|immunological synapse|natural killer cell mediated cytotoxicity|granzyme-mediated apoptotic signaling pathway
| path:hsa04210,path:hsa04650,path:hsa04940,path:hsa05202,path:hsa05320,path:hsa05330,path:hsa05332
| Apoptosis|Natural killer cell mediated cytotoxicity|Type I diabetes mellitus|Transcriptional misregulation in cancer|Autoimmune thyroid disease|Allograft rejection|Graft-versus-host disease
|
| FGFBP2 | 0 | 2.31448693662621 | 0.799 | 0.013 | 0 | 3 | 61.462 | ENSG00000137441 | protein_coding | fibroblast growth factor binding protein 2 [Source:HGNC Symbol;Acc:HGNC:29451]
| -- | GO:0005615,GO:0005576,GO:0019838,GO:0007267
| extracellular space|extracellular region|growth factor binding|cell-cell signaling
| -- | -- |
每个细胞群中所有 Marker 基因结果表格:
每个细胞群中 Top10 Marker 基因结果表格:
每个细胞群中 Top10 Marker 基因的可视化展示如下:
4.6.细胞类型鉴定
目前主流的细胞类型鉴定方法有两种:一是基于特定 Marker 基因人为鉴定,二是基于单细胞参考表达谱数据集自动鉴定。以上两种方法均可获得细胞类型鉴定结果,但各有优劣,前者受制于目前已有的 Marker 基因与细胞类型的注释,也存在较大的人为主观因素干扰,而基于参考数据集的细胞类型鉴定方式则摒除了研究人员主观因素的干扰,能够有效识别细胞亚型,但同样也会受制于参考数据集的注释来源与数据质量。相对而言,随着目前可获得的单细胞表达谱数据越来越多,后者能够鉴定的细胞类型会越来越精细。
本项目采用 SingleR[3] 包,基于 HPCA[4]参考数据集进行细胞类型注释。该方法通过计算单细胞参考表达谱数据集与待鉴定的细胞表达谱之间的相关性,将待鉴定细胞注释为与参考数据集中相关性最高的一种细胞类型。
报告中的数据集鉴定结果供参考,后续可根据文献已有相关基因对细胞群的特征加以描述和验证。
数据集方法细胞类型鉴定参考结果见:
细胞类型鉴定相关性热图结果见:
图 4.6.2 细胞类型鉴定相关性热图
图片说明:每一行表示参考数据集中的细胞类型注释名称,每一列表示待鉴定的细胞。颜色越红表示相关性值越大,表明待鉴定的细胞类型与参考数据集中的该种细胞类型最为相似。
细胞类型注释表格见:
| Barcode | sampleid | celltype | clusters | group |
|---|---|---|---|---|
| AAACCCAAGAATACAC-1 | N1 | T_cells | 1 | N |
| AAACCCAAGGTCACCC-1 | N1 | Epithelial_cells | 16 | N |
| AAACCCACATGATGCT-1 | N1 | Epithelial_cells | 16 | N |
| AAACCCATCTAGGCCG-1 | N1 | T_cells | 2 | N |
| AAACCCATCTATCCAT-1 | N1 | T_cells | 2 | N |
| AAACGAAAGGCCTGAA-1 | N1 | NK_cell | 3 | N |
| AAACGAAAGGGCAGAG-1 | N1 | T_cells | 2 | N |
| AAACGAAGTATTTCGG-1 | N1 | Fibroblasts | 12 | N |
| AAACGAATCCCAGGAC-1 | N1 | T_cells | 2 | N |
| AAACGAATCTCCGTGT-1 | N1 | T_cells | 5 | N |
各细胞群中的原始细胞类型数目统计表格见:
4.7.差异表达基因筛选
根据差异倍数(FoldChange)及差异显著性检验(pvalue)结果筛选差异表达基因,默认使用 MAST 差异检验方法。
详细结果见目录: 7.Diffexp
各分组差异基因数目统计表如下:
| Case | Control | Up_diff | Down_diff | Total_diff(pvalue<0.05&FoldChange>1.5) |
|---|---|---|---|---|
| T(group) | N(group) | 31 | 7 | 38 |
差异显著基因结果示例:
| gene | pvalue | pct.1 | pct.2 | padj | baseMean | FoldChange | log2FoldChange | up_down | ensemble_id | gene_type | gene_description | TFs_Family | GO_id | GO_term | pathway | pathway_description |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SFTPC | 0 | 0.682 | 0.592 | 0 | 0.865973540631567 | 2.25309849173371 | 1.17191038078596 | Up | ENSG00000168484 | protein_coding | surfactant protein C [Source:HGNC Symbol;Acc:HGNC:10802]
| -- | GO:0016020,GO:0016021,GO:0005576,GO:0005615,GO:0005515,GO:0042802,GO:0007585,GO:0005789,GO:0044267,GO:0045334,GO:0097486,GO:0042599
| membrane|integral component of membrane|extracellular region|extracellular space|protein binding|identical protein binding|respiratory gaseous exchange by respiratory system|endoplasmic reticulum membrane|cellular protein metabolic process|clathrin-coated endocytic vesicle|multivesicular body lumen|lamellar body
| -- | -- |
| HSPA1A | 0 | 0.591 | 0.396 | 0 | 0.221467814996896 | 2.1121228824058 | 1.07869377251846 | Up | ENSG00000204389 | protein_coding | heat shock protein family A (Hsp70) member 1A [Source:HGNC Symbol;Acc:HGNC:5232]
| -- | GO:0010628,GO:0048471,GO:0003723,GO:0005634,GO:0005737,GO:0016607,GO:0000166,GO:0005524,GO:0043066,GO:0005783,GO:0005739,GO:0005829,GO:0005856,GO:0005515,GO:0046718,GO:0005925,GO:0005813,GO:0005815,GO:0005576,GO:0051082,GO:0042026,GO:0016887,GO:0005886,GO:0016192,GO:0034605,GO:0031625,GO:0006986,GO:0003714,GO:0005654,GO:0045296,GO:0070062,GO:0050821,GO:0005814,GO:0008285,GO:0008180,GO:0072562,GO:0043488,GO:0032991,GO:1900034,GO:0043312,GO:1990904,GO:0032436,GO:0005102,GO:0047485,GO:0051092,GO:0030308,GO:1904813,GO:0031982,GO:0019899,GO:2001240,GO:0001618,GO:0007041,GO:0030512,GO:0016235,GO:0042826,GO:0034599,GO:0060548,GO:1901673,GO:1902236,GO:0042623,GO:0001664,GO:0034620,GO:0031072,GO:0090084,GO:0051085,GO:0097718,GO:0006402,GO:0097201,GO:0032757,GO:0046034,GO:0031396,GO:0051131,GO:0031249,GO:0044183,GO:0051787,GO:0055131,GO:0031397,GO:0033120,GO:0045648,GO:0070370,GO:0070434,GO:0090063,GO:1901029,GO:1902380,GO:1903265,GO:0016234
| positive regulation of gene expression|perinuclear region of cytoplasm|RNA binding|nucleus|cytoplasm|nuclear speck|nucleotide binding|ATP binding|negative regulation of apoptotic process|endoplasmic reticulum|mitochondrion|cytosol|cytoskeleton|protein binding|viral entry into host cell|focal adhesion|centrosome|microtubule organizing center|extracellular region|unfolded protein binding|protein refolding|ATPase activity|plasma membrane|vesicle-mediated transport|cellular response to heat|ubiquitin protein ligase binding|response to unfolded protein|transcription corepressor activity|nucleoplasm|cadherin binding|extracellular exosome|protein stabilization|centriole|negative regulation of cell population proliferation|COP9 signalosome|blood microparticle|regulation of mRNA stability|protein-containing complex|regulation of cellular response to heat|neutrophil degranulation|ribonucleoprotein complex|positive regulation of proteasomal ubiquitin-dependent protein catabolic process|signaling receptor binding|protein N-terminus binding|positive regulation of NF-kappaB transcription factor activity|negative regulation of cell growth|ficolin-1-rich granule lumen|vesicle|enzyme binding|negative regulation of extrinsic apoptotic signaling pathway in absence of ligand|virus receptor activity|lysosomal transport|negative regulation of transforming growth factor beta receptor signaling pathway|aggresome|histone deacetylase binding|cellular response to oxidative stress|negative regulation of cell death|regulation of mitotic spindle assembly|negative regulation of endoplasmic reticulum stress-induced intrinsic apoptotic signaling pathway|ATPase activity, coupled|G protein-coupled receptor binding|cellular response to unfolded protein|heat shock protein binding|negative regulation of inclusion body assembly|chaperone cofactor-dependent protein refolding|disordered domain specific binding|mRNA catabolic process|negative regulation of transcription from RNA polymerase II promoter in response to stress|positive regulation of interleukin-8 production|ATP metabolic process|regulation of protein ubiquitination|chaperone-mediated protein complex assembly|denatured protein binding|protein folding chaperone|misfolded protein binding|C3HC4-type RING finger domain binding|negative regulation of protein ubiquitination|positive regulation of RNA splicing|positive regulation of erythrocyte differentiation|cellular heat acclimation|positive regulation of nucleotide-binding oligomerization domain containing 2 signaling pathway|positive regulation of microtubule nucleation|negative regulation of mitochondrial outer membrane permeabilization involved in apoptotic signaling pathway|positive regulation of endoribonuclease activity|positive regulation of tumor necrosis factor-mediated signaling pathway|inclusion body
| path:hsa03040,path:hsa04010,path:hsa04141,path:hsa04144,path:hsa04213,path:hsa04612,path:hsa04915,path:hsa05020,path:hsa05134,path:hsa05145,path:hsa05162,path:hsa05164
| Spliceosome|MAPK signaling pathway|Protein processing in endoplasmic reticulum|Endocytosis|Longevity regulating pathway - multiple species|Antigen processing and presentation|Estrogen signaling pathway|Prion diseases|Legionellosis|Toxoplasmosis|Measles|Influenza A
|
| SFTPB | 0 | 0.504 | 0.292 | 0 | -0.1005994075552 | 2.02396476070806 | 1.01718417145796 | Up | ENSG00000168878 | protein_coding | surfactant protein B [Source:HGNC Symbol;Acc:HGNC:10801]
| -- | GO:0006629,GO:0006665,GO:0005615,GO:0005764,GO:0005576,GO:0005789,GO:0005654,GO:0044267,GO:0009887,GO:0045334,GO:0005771,GO:0007585,GO:0097486,GO:0042599,GO:0097208
| lipid metabolic process|sphingolipid metabolic process|extracellular space|lysosome|extracellular region|endoplasmic reticulum membrane|nucleoplasm|cellular protein metabolic process|animal organ morphogenesis|clathrin-coated endocytic vesicle|multivesicular body|respiratory gaseous exchange by respiratory system|multivesicular body lumen|lamellar body|alveolar lamellar body
| -- | -- |
| SFTPA1 | 0 | 0.38 | 0.199 | 0 | -0.61709854892462 | 1.91260247284977 | 0.935537046164366 | Up | ENSG00000122852 | protein_coding | surfactant protein A1 [Source:HGNC Symbol;Acc:HGNC:10798]
| -- | GO:0030246,GO:0005615,GO:0005515,GO:0005576,GO:0005581,GO:0006869,GO:0005789,GO:0044267,GO:0002224,GO:0045334,GO:0005319,GO:0048029,GO:0007585,GO:0042599,GO:0032502,GO:0008228
| carbohydrate binding|extracellular space|protein binding|extracellular region|collagen trimer|lipid transport|endoplasmic reticulum membrane|cellular protein metabolic process|toll-like receptor signaling pathway|clathrin-coated endocytic vesicle|lipid transporter activity|monosaccharide binding|respiratory gaseous exchange by respiratory system|lamellar body|developmental process|opsonization
| path:hsa04145,path:hsa05133 | Phagosome|Pertussis |
| PGC | 0 | 0.232 | 0.092 | 0 | -1.74678549845595 | 1.82861416927879 | 0.870750703961402 | Up | ENSG00000096088 | protein_coding | progastricsin [Source:HGNC Symbol;Acc:HGNC:8890]
| -- | GO:0016787,GO:0005615,GO:0008233,GO:0006508,GO:0005576,GO:0004190,GO:0030163,GO:0007586,GO:0002803
| hydrolase activity|extracellular space|peptidase activity|proteolysis|extracellular region|aspartic-type endopeptidase activity|protein catabolic process|digestion|positive regulation of antibacterial peptide production
| -- | -- |
| SFTPA2 | 0 | 0.365 | 0.203 | 0 | -0.645804484433407 | 1.77730182624439 | 0.82968870457977 | Up | ENSG00000185303 | protein_coding | surfactant protein A2 [Source:HGNC Symbol;Acc:HGNC:10799]
| -- | GO:0030246,GO:0005615,GO:0005576,GO:0005581,GO:0005789,GO:0044267,GO:0002224,GO:0045334,GO:0048029,GO:0007585,GO:0042599,GO:0032502
| carbohydrate binding|extracellular space|extracellular region|collagen trimer|endoplasmic reticulum membrane|cellular protein metabolic process|toll-like receptor signaling pathway|clathrin-coated endocytic vesicle|monosaccharide binding|respiratory gaseous exchange by respiratory system|lamellar body|developmental process
| path:hsa04145,path:hsa05133 | Phagosome|Pertussis |
| HSPA1B | 0 | 0.462 | 0.298 | 0 | -0.337821274326878 | 1.75072359829428 | 0.807951331146355 | Up | ENSG00000204388 | protein_coding | heat shock protein family A (Hsp70) member 1B [Source:HGNC Symbol;Acc:HGNC:5233]
| -- | GO:0010628,GO:0048471,GO:0003723,GO:0005634,GO:0005737,GO:0016607,GO:0000166,GO:0005524,GO:0043066,GO:0005783,GO:0005739,GO:0005829,GO:0005856,GO:0005515,GO:0046718,GO:0005925,GO:0005813,GO:0005815,GO:0005576,GO:0051082,GO:0042026,GO:0016887,GO:0005886,GO:0016192,GO:0034605,GO:0031625,GO:0005654,GO:0070062,GO:0050821,GO:0005814,GO:0008285,GO:0072562,GO:0032991,GO:1900034,GO:0043312,GO:1990904,GO:0032436,GO:0005102,GO:0047485,GO:0051092,GO:0030308,GO:1904813,GO:0031982,GO:0019899,GO:2001240,GO:0001618,GO:0016235,GO:0042826,GO:0034599,GO:0060548,GO:1901673,GO:0042623,GO:0001664,GO:0031072,GO:0090084,GO:0051085,GO:0006402,GO:0032757,GO:0046034,GO:0031396,GO:0008180,GO:0044183,GO:0051787,GO:0055131,GO:0006986,GO:0031397,GO:0034620,GO:0045648,GO:0070370,GO:0070434,GO:0090063,GO:1903265,GO:0016234
| positive regulation of gene expression|perinuclear region of cytoplasm|RNA binding|nucleus|cytoplasm|nuclear speck|nucleotide binding|ATP binding|negative regulation of apoptotic process|endoplasmic reticulum|mitochondrion|cytosol|cytoskeleton|protein binding|viral entry into host cell|focal adhesion|centrosome|microtubule organizing center|extracellular region|unfolded protein binding|protein refolding|ATPase activity|plasma membrane|vesicle-mediated transport|cellular response to heat|ubiquitin protein ligase binding|nucleoplasm|extracellular exosome|protein stabilization|centriole|negative regulation of cell population proliferation|blood microparticle|protein-containing complex|regulation of cellular response to heat|neutrophil degranulation|ribonucleoprotein complex|positive regulation of proteasomal ubiquitin-dependent protein catabolic process|signaling receptor binding|protein N-terminus binding|positive regulation of NF-kappaB transcription factor activity|negative regulation of cell growth|ficolin-1-rich granule lumen|vesicle|enzyme binding|negative regulation of extrinsic apoptotic signaling pathway in absence of ligand|virus receptor activity|aggresome|histone deacetylase binding|cellular response to oxidative stress|negative regulation of cell death|regulation of mitotic spindle assembly|ATPase activity, coupled|G protein-coupled receptor binding|heat shock protein binding|negative regulation of inclusion body assembly|chaperone cofactor-dependent protein refolding|mRNA catabolic process|positive regulation of interleukin-8 production|ATP metabolic process|regulation of protein ubiquitination|COP9 signalosome|protein folding chaperone|misfolded protein binding|C3HC4-type RING finger domain binding|response to unfolded protein|negative regulation of protein ubiquitination|cellular response to unfolded protein|positive regulation of erythrocyte differentiation|cellular heat acclimation|positive regulation of nucleotide-binding oligomerization domain containing 2 signaling pathway|positive regulation of microtubule nucleation|positive regulation of tumor necrosis factor-mediated signaling pathway|inclusion body
| path:hsa03040,path:hsa04010,path:hsa04141,path:hsa04144,path:hsa04213,path:hsa04612,path:hsa04915,path:hsa05134,path:hsa05145,path:hsa05162,path:hsa05164
| Spliceosome|MAPK signaling pathway|Protein processing in endoplasmic reticulum|Endocytosis|Longevity regulating pathway - multiple species|Antigen processing and presentation|Estrogen signaling pathway|Legionellosis|Toxoplasmosis|Measles|Influenza A
|
| DNAJB1 | 0 | 0.634 | 0.526 | 0 | 0.376319318843641 | 1.71840478118671 | 0.781069912509965 | Up | ENSG00000132002 | protein_coding | DnaJ heat shock protein family (Hsp40) member B1 [Source:HGNC Symbol;Acc:HGNC:5270]
| -- | GO:0005737,GO:0005634,GO:0005829,GO:0006457,GO:0005515,GO:0051082,GO:0005730,GO:0003714,GO:0005654,GO:0045296,GO:0070062,GO:1900034,GO:0051117,GO:0051087,GO:0006986,GO:0030544,GO:0001671,GO:0090084,GO:0051085,GO:0097201,GO:0032781,GO:0044183,GO:0030900,GO:0014069,GO:0043025,GO:0043197,GO:0061827,GO:0098794,GO:0098978
| cytoplasm|nucleus|cytosol|protein folding|protein binding|unfolded protein binding|nucleolus|transcription corepressor activity|nucleoplasm|cadherin binding|extracellular exosome|regulation of cellular response to heat|ATPase binding|chaperone binding|response to unfolded protein|Hsp70 protein binding|ATPase activator activity|negative regulation of inclusion body assembly|chaperone cofactor-dependent protein refolding|negative regulation of transcription from RNA polymerase II promoter in response to stress|positive regulation of ATPase activity|protein folding chaperone|forebrain development|postsynaptic density|neuronal cell body|dendritic spine|sperm head|postsynapse|glutamatergic synapse
| path:hsa04141,path:hsa05164 | Protein processing in endoplasmic reticulum|Influenza A
|
| HSP90AA1 | 3.3643044481136e-313 | 0.868 | 0.825 | 1.23136907105406e-308 | 1.36206833795565 | 1.5409781296886 | 0.623846386568014 | Up | ENSG00000080824 | protein_coding | heat shock protein 90 alpha family class A member 1 [Source:HGNC Symbol;Acc:HGNC:5253]
| -- | GO:0051897,GO:0001934,GO:1902949,GO:0043025,GO:0003723,GO:0005634,GO:0005737,GO:0016020,GO:0000166,GO:0005524,GO:0005829,GO:0006457,GO:0042803,GO:0005886,GO:0005515,GO:0007165,GO:0042470,GO:0009409,GO:0051082,GO:0009408,GO:0006898,GO:0038096,GO:0005576,GO:0042802,GO:0009986,GO:0034605,GO:0006986,GO:0005654,GO:0070062,GO:0050821,GO:0016887,GO:0032991,GO:1900034,GO:0000086,GO:0010389,GO:0097711,GO:0043312,GO:0046677,GO:0034774,GO:1904813,GO:0019221,GO:0050999,GO:0030911,GO:0048471,GO:0048010,GO:0048156,GO:0038128,GO:0051973,GO:0007004,GO:0042826,GO:0044295,GO:0042026,GO:0033138,GO:0097110,GO:0051020,GO:0097718,GO:0030235,GO:0071682,GO:1990782,GO:0023026,GO:0031625,GO:0042623,GO:0070182,GO:0006839,GO:0021955,GO:0030010,GO:0031396,GO:0032273,GO:0043254,GO:0043335,GO:0045040,GO:0045429,GO:0048675,GO:0051131,GO:0051186,GO:0061684,GO:1903364,GO:1903827,GO:1905323,GO:0043202,GO:0043209,GO:0044294,GO:0044183
| positive regulation of protein kinase B signaling|positive regulation of protein phosphorylation|positive regulation of tau-protein kinase activity|neuronal cell body|RNA binding|nucleus|cytoplasm|membrane|nucleotide binding|ATP binding|cytosol|protein folding|protein homodimerization activity|plasma membrane|protein binding|signal transduction|melanosome|response to cold|unfolded protein binding|response to heat|receptor-mediated endocytosis|Fc-gamma receptor signaling pathway involved in phagocytosis|extracellular region|identical protein binding|cell surface|cellular response to heat|response to unfolded protein|nucleoplasm|extracellular exosome|protein stabilization|ATPase activity|protein-containing complex|regulation of cellular response to heat|G2/M transition of mitotic cell cycle|regulation of G2/M transition of mitotic cell cycle|ciliary basal body-plasma membrane docking|neutrophil degranulation|response to antibiotic|secretory granule lumen|ficolin-1-rich granule lumen|cytokine-mediated signaling pathway|regulation of nitric-oxide synthase activity|TPR domain binding|perinuclear region of cytoplasm|vascular endothelial growth factor receptor signaling pathway|tau protein binding|ERBB2 signaling pathway|positive regulation of telomerase activity|telomere maintenance via telomerase|histone deacetylase binding|axonal growth cone|protein refolding|positive regulation of peptidyl-serine phosphorylation|scaffold protein binding|GTPase binding|disordered domain specific binding|nitric-oxide synthase regulator activity|endocytic vesicle lumen|protein tyrosine kinase binding|MHC class II protein complex binding|ubiquitin protein ligase binding|ATPase activity, coupled|DNA polymerase binding|mitochondrial transport|central nervous system neuron axonogenesis|establishment of cell polarity|regulation of protein ubiquitination|positive regulation of protein polymerization|regulation of protein complex assembly|protein unfolding|protein insertion into mitochondrial outer membrane|positive regulation of nitric oxide biosynthetic process|axon extension|chaperone-mediated protein complex assembly|cofactor metabolic process|chaperone-mediated autophagy|positive regulation of cellular protein catabolic process|regulation of cellular protein localization|telomerase holoenzyme complex assembly|lysosomal lumen|myelin sheath|dendritic growth cone|protein folding chaperone
| path:hsa04141,path:hsa04151,path:hsa04217,path:hsa04612,path:hsa04621,path:hsa04657,path:hsa04659,path:hsa04914,path:hsa04915,path:hsa05200,path:hsa05215,path:hsa05418
| Protein processing in endoplasmic reticulum|PI3K-Akt signaling pathway|Necroptosis|Antigen processing and presentation|NOD-like receptor signaling pathway|IL-17 signaling pathway|Th17 cell differentiation|Progesterone-mediated oocyte maturation|Estrogen signaling pathway|Pathways in cancer|Prostate cancer|Fluid shear stress and atherosclerosis
|
| APOC1 | 3.22797477971125e-303 | 0.361 | 0.251 | 1.18147104912211e-298 | -0.747948701252588 | 2.39382064076411 | 1.25931506106426 | Up | ENSG00000130208 | protein_coding | apolipoprotein C1 [Source:HGNC Symbol;Acc:HGNC:607]
| -- | GO:0045717,GO:0051005,GO:0005576,GO:0006641,GO:0008203,GO:0042157,GO:0043085,GO:0006869,GO:0034361,GO:0005783,GO:0031210,GO:0006629,GO:0034364,GO:0033344,GO:0050995,GO:0034379,GO:0034375,GO:0033700,GO:0010873,GO:0045833,GO:0060228,GO:0034382,GO:0034447,GO:0042627,GO:0055102,GO:0032375,GO:0004859,GO:0005504,GO:0010900,GO:0010916,GO:0032374,GO:0034369,GO:0048261
| negative regulation of fatty acid biosynthetic process|negative regulation of lipoprotein lipase activity|extracellular region|triglyceride metabolic process|cholesterol metabolic process|lipoprotein metabolic process|positive regulation of catalytic activity|lipid transport|very-low-density lipoprotein particle|endoplasmic reticulum|phosphatidylcholine binding|lipid metabolic process|high-density lipoprotein particle|cholesterol efflux|negative regulation of lipid catabolic process|very-low-density lipoprotein particle assembly|high-density lipoprotein particle remodeling|phospholipid efflux|positive regulation of cholesterol esterification|negative regulation of lipid metabolic process|phosphatidylcholine-sterol O-acyltransferase activator activity|chylomicron remnant clearance|very-low-density lipoprotein particle clearance|chylomicron|lipase inhibitor activity|negative regulation of cholesterol transport|phospholipase inhibitor activity|fatty acid binding|negative regulation of phosphatidylcholine catabolic process|negative regulation of very-low-density lipoprotein particle clearance|regulation of cholesterol transport|plasma lipoprotein particle remodeling|negative regulation of receptor-mediated endocytosis
| path:hsa04979 | Cholesterol metabolism |
将差异基因的差异倍数(FoldChange)从大到小排列,上下调各选取 25 个基因绘制热图:
图 4.7 上下调 Top25 差异基因热图
图片说明:横坐标为差异分组信息,纵坐标为上下调 Top25 基因(如果上下调差异基因不足25个,则绘制全部基因;线粒体基因和核糖体基因默认不进行绘图)。图中黄色表示高表达,紫色表示低表达。
4.8.差异基因富集分析
4.8.1.差异基因 GO 富集分析
GO(Gene Ontology)[5]数据库是由基因本体论联合会建立,该数据库将全世界所有与基因有关的研究结果进行分类汇总,对不同数据库中关于基因和基因产物的生物学术语进行标准化,对基因和蛋白功能进行统一的限定和描述。利用GO数据库,可以在以下三个方面对基因和基因产物进行分类注释:
1) BP(Biological Process):参与的生物学过程;
2) MF(Molecular Function):实现的分子功能;
3) CC(Cellular Component):构成的细胞组分。
通过对差异表达基因进行GO 富集分析,找到不同条件下差异基因与哪些生物学功能或者细胞通路相关。
GO 功能富集分析的方法:将全部蛋白编码基因作为背景列表,差异蛋白编码基因列表作为从背景列表中筛选出来的候选列表,利用超几何分布检验计算代表 GO 功能集在差异蛋白编码基因列表中是否显著富集的 p 值,再对 p 值经Benjamini & Hochberg 多重检验纠正后得到 qValue。
图 4.8.1.1 超几何分布检验计算 p 值的公式和 Enrichment score 计算公式
其中,N 为所有基因中具有 GO 注释的基因数目;n 为 N 中差异表达基因中具有 GO 注释的基因数目;M 为所有基因中注释为某特定 GO Term 的基因数目;m 为注释为某特定 GO Term 的差异表达基因数目。可以根据 GO 分析的结果结合生物学意义从而挑选用于后续研究的基因。
输出文件结果目录: 8.enrichment/GO_enrichment
GO 富集分析结果示例:
| id | term | category | ListHits | ListTotal | PopHits | PopTotal | pval | padj | Enrichment_score | Gene |
|---|---|---|---|---|---|---|---|---|---|---|
| GO:0070488 | neutrophil aggregation | biological_process | 2 | 38 | 2 | 19400 | 0 | 0 | 510.526315789474 | S100A9; S100A8 |
| GO:0071471 | cellular response to non-ionic osmotic stress
| biological_process | 1 | 38 | 1 | 19400 | 0 | 0 | 510.526315789474 | PTGS2 |
| GO:0031249 | denatured protein binding | molecular_function | 1 | 38 | 1 | 19400 | 0 | 0 | 510.526315789474 | HSPA1A |
| GO:1902380 | positive regulation of endoribonuclease activity
| biological_process | 1 | 38 | 1 | 19400 | 0 | 0 | 510.526315789474 | HSPA1A |
| GO:0097160 | polychlorinated biphenyl binding
| molecular_function | 1 | 38 | 1 | 19400 | 0 | 0 | 510.526315789474 | SCGB1A1 |
| GO:1905572 | ganglioside GM1 transport to membrane
| biological_process | 1 | 38 | 1 | 19400 | 0 | 0 | 510.526315789474 | PSAP |
| GO:1905574 | ganglioside GM2 binding | molecular_function | 1 | 38 | 1 | 19400 | 0 | 0 | 510.526315789474 | PSAP |
| GO:1905575 | ganglioside GM3 binding | molecular_function | 1 | 38 | 1 | 19400 | 0 | 0 | 510.526315789474 | PSAP |
| GO:1905577 | ganglioside GP1c binding | molecular_function | 1 | 38 | 1 | 19400 | 0 | 0 | 510.526315789474 | PSAP |
| GO:0019747 | regulation of isoprenoid metabolic process
| biological_process | 1 | 38 | 1 | 19400 | 0 | 0 | 510.526315789474 | NPC2 |
GO 富集分析 top30 (筛选三种分类中对应差异基因数目大于 2 的 GO 条目,按照每个条目对应的 -log10pValue 由大到小排序的各 10 条)条形图展示如下:
使用 fisher 算法分别对样本间差异基因进行 CC,BP,MF 富集分析,并使用 topGO[6] 对富集到的 Term 绘制有向无环图。topGO 有向无环图能直观展示差异表达基因富集的 GO 节点(Term)及其层级关系,是差异表达基因 GO 富集分析的结果图形化展示,分支代表的包含关系,从上至下所定义的功能描述范围越来越具体。
图 4.8.1.3 差异基因topGO有向无环示例图展示
图片说明:对每个 GO Term 进行富集,最显著的 10 个节点用矩形表示。矩形的颜色代表富集显著性,从黄色到红色显著性越来越高。每个节点的基本信息显示在相应的图形中,为 GO ID 和 GO Term。
根据功能分级,一般将 GO 分为三个层级,level1 包含三个条目:biological process、cellular component和molecular function,level2 包含 biological adhesion、cell 和 binding 等 64 个条目,level3 即为常规富集使用的数万个条目。从 level1到 level3 功能更具体,反之,更概括。
差异基因和所有基因在 GO Level2 水平分布比较图如下:
图 4.8.1.4 差异表达基因及所有基因在 GO Level2 水平分布比较图
图片说明:蓝色表示所有基因富集的 GO Level2 条目,红色表示差异基因富集的 GO Level2 条目,横轴为条目名称,纵轴表示对应条目的基因数量和其百分比。
上调差异基因和下调差异基因在 GO Level2 水平分布比较图如下:
图 4.8.1.5 上调差异基因和下调差异基因在 GO Level2 水平分布比较图
图片说明:红色表示上调差异表达基因富集的 GO Level2 条目,绿色表示下调差异表达基因富集的 GO Level2 条目,横轴为条目名称,纵轴表示对应条目的基因数量和其百分比。
4.8.2.差异基因 KEGG 富集分析
KEGG[7] 是有关 Pathway 的主要公共数据库,利用KEGG数据库对差异蛋白编码基因进行 Pathway 分析(结合 KEGG 注释结果),并用超几何分布检验的方法计算每个 Pathway 条目中差异基因富集的显著性。
Kyoto Encyclopedia of Genes and Genomes (KEGG)是系统分析基因功能,联系基因组信息和功能信息的数据库,探索本次得到的差异基因可能与哪些通路相关。
富集分析计算的结果会返回一个富集显著性的 p 值,小的 p 值表示差异基因在该 Pathway 中出现了富集。相应的计算公式参见 GO 富集分析。Pathway 分析对实验结果有提示的作用,通过差异基因的 Pathway 分析,可以找到富集差异基因的 Pathway 条目,寻找不同样本的差异蛋白编码基因可能和哪些细胞通路的改变有关。
差异基因 KEGG 富集分析结果:8.enrichment/KEGG_enrichment
KEGG 富集分析 top20(筛选对应差异基因数目大于 2 的 Pathway 条目,按照每个条目对应的 -log10Pvalue 由大到小排序)气泡图如下:
图 4.8.2.1 KEGG富集 top20 气泡图
图片说明:图中横轴 Enrichment Score 为富集分值,气泡越大的条目包含的差异蛋白编码基因数目越多,气泡颜色由紫-蓝-绿-红变化,其富集 pValue 值越小,显著程度越大。
根据功能分级,通常将 KEGG 分为三个层级,level1 包含六个分类:Metabolism、Genetic Information Processing、Environmental Information Processing、Cellular Processes、Organismal Systems 和 Human Diseases(具体物种注释可能有删减)。level2 包含 Cell growth and death、Transcription 和 Development 等 44 个分类(具体物种注释可能有删减),level3 即为常规富集使用的数百个 Pathway,从 level1 到 level3 功能更具体,反之,更概括。
差异表达基因及所有基因在 KEGG Level2 水平分布比较图如下:
图 4.8.2.2 差异表达基因及所有基因在 KEGG Level2 水平分布比较图
图片说明:横轴是注释到各 Level2 通路的基因(差异表达基因)和所有注释到 KEGG 通路的基因(差异表达基因)总数的比值(%),纵轴表示 Level2 Pathway 的名称,柱子右边数字代表注释到该Level2 Pathway下的差异表达基因数量。
上调差异表达基因及下调差异表达基因在 KEGG Level2 水平分布图如下:
图 4.8.2.3 上调差异表达基因及下调差异表达基因在 KEGG Level2 水平分布图
图片说明:横轴是注释到各 Level2 通路的上调(下调)差异表达基因和所有注释到 KEGG 通路的上调(下调)差异表达基因总数的比值(%),纵轴表示 Level2 Pathway 的名称,柱子右边数字代表注释到该 Level2 Pathway 的上调(下调)差异表达基因数量。
5.附录
5.1.Loupe-Browser使用教程
Loupe-Browser使用教程:
supplemental_material/Loupe-Browser使用教程.html
5.2.实验技术方法说明
单细胞转录组实验技术方法说明文档:
supplemental_material/欧易生物单细胞转录组实验技术方法说明_中文.pdf
supplemental_material/欧易生物单细胞转录组实验技术方法说明_英文.pdf
5.3.生信分析方法说明
单细胞转录组生物信息分析方法说明文档:
supplemental_material/欧易生物单细胞转录组生信分析方法_中文.pdf
supplemental_material/欧易生物单细胞转录组生信分析方法_英文.pdf
5.4.常见问题FAQ
单细胞转录组常见问题(FAQ)汇总如下:
supplemental_material/单细胞转录组常见问题(FAQ).pdf
5.5.AI修图使用说明
AI修图工具使用说明
supplemental_material/AI使用说明.html
5.6.数据库信息
| 使用数据库 | 网页链接 |
|---|---|
| Genome Database | https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest
|
| GO Database | http://geneontology.org/ |
| KEGG Database | http://www.genome.jp/kegg/ |
5.7.数据分析软件
| Software | Version |
|---|---|
| Cell Ranger | 5.0.0 |
| Seurat | 3.1.1 |
| Monocle | 2 |
| Fastqc | 0.11.7 |
| Destiny | 2.10.2 |
| Scran | 1.8.4 |
| MAGIC | 1.2.1 |
5.8.参考文献
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger ↩
Dobin A , Davis C A , Schlesinger F , et al. STAR: ultrafast universal RNA-seq aligner[J]. Bioinformatics, 2013, 29(1):15-21. ↩
Aran D, Looney A P, Liu L, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage[J]. Nature immunology, 2019, 20(2): 163-172. ↩
Mabbott N A, Baillie J K, Brown H, et al. An expression atlas of human primary cells: inference of gene function from coexpression networks[J]. BMC genomics, 2013, 14(1): 632. ↩
The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. Jan 2019;47(D1):D330-D338. ↩
Alexa A, Rahnenfuhrer J. topGO: enrichment analysis for gene ontology. R package version 2.8,2010. ↩
Kanehisa M, Araki M, Goto S, et al. KEGG for linking genomes to life and the environment[J]. Nucleic acids research, 2008, 36(suppl 1): D480-D484. ↩
6.申明
本项目报告由上海欧易生物医学科技有限公司提供给项目相关客户。本公司承诺:未经客户同意,不向第三方泄露数据及数据分析内容,不将客户数据用于任何商业行为(遵循合同保密协议)。客户未经本公司同意,不得以任何目的向第三方出示项目报告。
本报告的最终解释权归上海欧易生物医学科技有限公司。